Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add fp8 support for SD/Update notebook paths #8387

Closed
wants to merge 66 commits into from

Conversation

Victor49152
Copy link
Collaborator

What does this PR do ?

Add FP8 support for SD training.

Collection: [multimodal/text_to_image/statble_diffusion]

Changelog

  • Add config argument - use_te_fp8 to switch between fp16 and fp8 training
  • Add transformer engine fp8 layers to Stable diffusion Unet
  • Update a few legacy paths in notebook tutorial

Usage

unet_config:
    use_te_fp8=True

Jenkins CI

To run Jenkins, a NeMo User with write access must comment jenkins on the PR.

Before your PR is "Ready for review"

Pre checks:

  • [x ] Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you add or update any necessary documentation?
  • Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
    • Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

  • [x ] New Feature
  • Bugfix
  • Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

  • Related to # (issue)

XuesongYang and others added 24 commits January 29, 2024 15:46
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
* Add Bert HF checkpoint converter

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Reformat

Signed-off-by: yaoyu-33 <[email protected]>

* Add BERT ONNX export

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add NeMo BERT to HF BERT script

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Clean code

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update argument names

Signed-off-by: yaoyu-33 <[email protected]>

* Update build_transformer_config in Bert

Signed-off-by: yaoyu-33 <[email protected]>

---------

Signed-off-by: yaoyu-33 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Bobby Chen <[email protected]>
* Fix docs build

Signed-off-by: Vladimir Bataev <[email protected]>

* Clean up

Signed-off-by: Vladimir Bataev <[email protected]>

* Fix mock imports

Signed-off-by: Vladimir Bataev <[email protected]>

* Add comment

Signed-off-by: Vladimir Bataev <[email protected]>

---------

Signed-off-by: Vladimir Bataev <[email protected]>
* add notebook



* rename old notebook to Buffered_Streaming



* call setup_streaming_params in set_default_att_context_size method



* update links in docs



* update links to tutorials in docs



* remove hard-coding



* rename var



---------

Signed-off-by: Elena Rastorgueva <[email protected]>
Co-authored-by: Elena Rastorgueva <[email protected]>
…8242)

* Rebasing canary changes at current main

Signed-off-by: Piotr Żelasko <[email protected]>

* Move the changes from asr transformer to nlp transformer as originally intended

Signed-off-by: Piotr Żelasko <[email protected]>

* update eval to strip spaces before punctuations

Signed-off-by: stevehuang52 <[email protected]>

* update pc strip

Signed-off-by: stevehuang52 <[email protected]>

* [canary] Refactor: `PromptedAudioToTextLhotseDataset` and `EncDecMultiTaskModel` (#8247)

* Create a separate CanaryDataset and use it inside `transformer_bpe_models.py`. Ditches `token_sequence_format`.

Signed-off-by: Piotr Żelasko <[email protected]>

* [canary] Refactor: move changes in transformer_bpe_models.py to Canar… (#8252)

* [canary] Refactor: move changes in transformer_bpe_models.py to CanaryModel

Signed-off-by: Piotr Żelasko <[email protected]>

* Rename `CanaryModel` to `EncDecMultiTaskModel` and remove inheritance from `EncDecTransfModelBPE`; add a separate config for this model

Signed-off-by: Piotr Żelasko <[email protected]>

---------

Signed-off-by: Piotr Żelasko <[email protected]>

* Rename `CanaryDataset` to `PromptedAudioToTextLhotseDataset`; add `prompt_format_fn` argument; clean-up the `_canary_prompt_format` function a bit

Signed-off-by: Piotr Żelasko <[email protected]>

* Move tokenization into `prompt_format_fn`, fix usage, add docs

Signed-off-by: Piotr Żelasko <[email protected]>

* Backward-compatible utterance validation

Signed-off-by: Piotr Żelasko <[email protected]>

* Improve type annotations

Signed-off-by: Piotr Żelasko <[email protected]>

* config and prompt_fn registration changes from review

Signed-off-by: Piotr Żelasko <[email protected]>

---------

Signed-off-by: Piotr Żelasko <[email protected]>

* fix transcribe config

Signed-off-by: stevehuang52 <[email protected]>

* Refactor Canary to follow schema of remaining ASR models (#8260)

* Initial draft of multi task beam decoding strategy

Signed-off-by: smajumdar <[email protected]>

* Stabilize inference

Signed-off-by: smajumdar <[email protected]>

* Update AED Multi Task model to mostly conform to Archetype-Type format. Update config

Signed-off-by: smajumdar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add change decoding strategy

Signed-off-by: smajumdar <[email protected]>

* Remove redundant imports

Signed-off-by: smajumdar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Cleanup

Signed-off-by: smajumdar <[email protected]>

* Cleanup

Signed-off-by: smajumdar <[email protected]>

* remove asr transformer dependency on nlp

Signed-off-by: stevehuang52 <[email protected]>

* clean up

Signed-off-by: stevehuang52 <[email protected]>

* copy token_classifier from nlp to asr

Signed-off-by: stevehuang52 <[email protected]>

* Address comments

Signed-off-by: smajumdar <[email protected]>

* Add typing to beam decoding

Signed-off-by: smajumdar <[email protected]>

* Make prompt format configurable

Signed-off-by: smajumdar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* drop asr dependency on nlp

Signed-off-by: stevehuang52 <[email protected]>

---------

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: stevehuang52 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: stevehuang52 <[email protected]>

* fix transcribe, update asr evaluator

Signed-off-by: stevehuang52 <[email protected]>

* Extend the docs for the canary prompt_fn

Signed-off-by: Piotr Żelasko <[email protected]>

* Incorporate changes from Nithin's code review

Signed-off-by: Piotr Żelasko <[email protected]>

* training bug fix and adding launch script for speech_multitask (#8270)

* bug fix and adding launch script for speech_multitask

Signed-off-by: Krishna Puvvada <[email protected]>

* update launch script example in speech_to_text_aed.py

Signed-off-by: Krishna Puvvada <[email protected]>

---------

Signed-off-by: Krishna Puvvada <[email protected]>
Co-authored-by: Krishna Puvvada <[email protected]>

* Fix: drop_last must be true in validation/test otherwise the training will hang

Signed-off-by: Piotr Żelasko <[email protected]>

* revert to current transcribe API

Signed-off-by: stevehuang52 <[email protected]>

* revert changes to NLP, update docs

Signed-off-by: stevehuang52 <[email protected]>

* update eval utils

Signed-off-by: stevehuang52 <[email protected]>

* update docs

Signed-off-by: stevehuang52 <[email protected]>

* Remove DALI; rename compute_audio_loss to compute_loss

Signed-off-by: Piotr Żelasko <[email protected]>

* set default use_model_transcribe=False

Signed-off-by: stevehuang52 <[email protected]>

* change os.path.dirname to pathlib

Signed-off-by: stevehuang52 <[email protected]>

* [canary] Test for CanaryTokenizer + refactoring (#8285)

* Test for CanaryTokenizer

Signed-off-by: Piotr Żelasko <[email protected]>

* Attempt at refactor...

Signed-off-by: Piotr Żelasko <[email protected]>

---------

Signed-off-by: Piotr Żelasko <[email protected]>

* Update config for AED models (#8294)

Signed-off-by: smajumdar <[email protected]>

* set default calculate_wer=False in transcribe_speech.py

Signed-off-by: stevehuang52 <[email protected]>

* Attention encoder-decoder models for multiple speech-to-text tasks

Signed-off-by: Piotr Żelasko <[email protected]>

* Apply suggestions from code review, part 1

Co-authored-by: Nithin Rao <[email protected]>
Signed-off-by: Piotr Żelasko <[email protected]>

* Apply suggestions from code review, part 2

Signed-off-by: Piotr Żelasko <[email protected]>

* Document compute_loss

Signed-off-by: Piotr Żelasko <[email protected]>

* update transcribe_speech.py

Signed-off-by: stevehuang52 <[email protected]>

* add docstring

Signed-off-by: stevehuang52 <[email protected]>

* Attention encoder-decoder models for multiple speech-to-text tasks

Signed-off-by: Piotr Żelasko <[email protected]>

---------

Signed-off-by: Piotr Żelasko <[email protected]>
Signed-off-by: stevehuang52 <[email protected]>
Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Krishna Puvvada <[email protected]>
Signed-off-by: Piotr Żelasko <[email protected]>
Co-authored-by: stevehuang52 <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Krishna Puvvada <[email protected]>
Co-authored-by: Krishna Puvvada <[email protected]>
Co-authored-by: He Huang (Steve) <[email protected]>
Co-authored-by: Nithin Rao <[email protected]>
* Loop labels greedy decoding v2

Signed-off-by: Vladimir Bataev <[email protected]>

* Add comments. Clean up

Signed-off-by: Vladimir Bataev <[email protected]>

* Add comments

Signed-off-by: Vladimir Bataev <[email protected]>

* Add comments

Signed-off-by: Vladimir Bataev <[email protected]>

* Add tests for batched hypotheses

Signed-off-by: Vladimir Bataev <[email protected]>

* Add tests for batched alignments

Signed-off-by: Vladimir Bataev <[email protected]>

* Add comments

Signed-off-by: Vladimir Bataev <[email protected]>

* Fix comment

Signed-off-by: Vladimir Bataev <[email protected]>

* Fix test

Signed-off-by: Vladimir Bataev <[email protected]>

* Method -> classmethod (self is not needed)

Signed-off-by: Vladimir Bataev <[email protected]>

---------

Signed-off-by: Vladimir Bataev <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
* updated online sample mapping

Signed-off-by: arendu <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix sample idx return

Signed-off-by: arendu <[email protected]>

* adding online sample mapping as default for sft dataset

Signed-off-by: arendu <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* removed pdeprecated eft model

Signed-off-by: arendu <[email protected]>

* import fix

Signed-off-by: arendu <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: arendu <[email protected]>
Signed-off-by: Adi Renduchintala <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: Micha Livne <[email protected]>
…megaconf (#8299) (#8319)

* save cp_size to self



* use parallel_state instead of self



---------

Signed-off-by: Jimmy Zhang <[email protected]>
Co-authored-by: JimmyZhang12 <[email protected]>
Co-authored-by: Jimmy Zhang <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Travis Bartley <[email protected]>
* Initial support for saving unpacked nemo file directly and support for uploading NeMo models to Huggingface Hub

Signed-off-by: smajumdar <[email protected]>

* Add support to restore nemo models from HF in unpacked format

Signed-off-by: smajumdar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Correct input types for model card

Signed-off-by: smajumdar <[email protected]>

* Update License Section to template

Signed-off-by: smajumdar <[email protected]>

* Add unit test to restore via unpacked checkpoint

Signed-off-by: smajumdar <[email protected]>

* Fix typo

Signed-off-by: smajumdar <[email protected]>

* Address comments

Signed-off-by: smajumdar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Address comments

Signed-off-by: smajumdar <[email protected]>

* Add docs and address comments

Signed-off-by: smajumdar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: smajumdar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: smajumdar <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
* Temp commit

Signed-off-by: smajumdar <[email protected]>

* Temp transcription api file

Signed-off-by: smajumdar <[email protected]>

* Initial draft of transcribable API

Signed-off-by: smajumdar <[email protected]>

* Udpate ASR nd Classiication model transcription functions

Signed-off-by: smajumdar <[email protected]>

* Commit draft - works on CTC/RNNT/Hybrid/SpeechClassification

Signed-off-by: smajumdar <[email protected]>

* Upate transcription tests

Signed-off-by: smajumdar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix minor typos

Signed-off-by: smajumdar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update tests

Signed-off-by: smajumdar <[email protected]>

* Try reverting numba fix

Signed-off-by: smajumdar <[email protected]>

* Revert patch

Signed-off-by: smajumdar <[email protected]>

* Formatting

Signed-off-by: smajumdar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add full support for tensor input to model.transcribe()

Signed-off-by: smajumdar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Make BS 2 for mixed tensor inference

Signed-off-by: smajumdar <[email protected]>

* Update return type signature and add return type tests. Fix decoder_timestamps_utils.py

Signed-off-by: smajumdar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix cast to numpy

Signed-off-by: smajumdar <[email protected]>

* Update transcribe speech script

Signed-off-by: smajumdar <[email protected]>

* Fix signature to transcribe()

Signed-off-by: smajumdar <[email protected]>

* Address comments, fix issues with paths2audio_files

Signed-off-by: smajumdar <[email protected]>

* Fix typo

Signed-off-by: smajumdar <[email protected]>

* Fix logprobs signature for ctc segmentation

Signed-off-by: smajumdar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Address comments

Signed-off-by: smajumdar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Address comments and add docs

Signed-off-by: smajumdar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove old logprobs

Signed-off-by: smajumdar <[email protected]>

* Fix issues with regression model and add regression config. Remove `return_generator` arg

Signed-off-by: smajumdar <[email protected]>

* Add AED tests

Signed-off-by: smajumdar <[email protected]>

* Add support for Canary models

Signed-off-by: smajumdar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Refactor prompt format for lhotse

Signed-off-by: smajumdar <[email protected]>

---------

Signed-off-by: smajumdar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Krishna Puvvada <[email protected]>
Co-authored-by: Krishna Puvvada <[email protected]>
Co-authored-by: Krishna Puvvada <[email protected]>
Signed-off-by: smajumdar <[email protected]>
Signed-off-by: smajumdar <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
* Unfinished checkpoints handling + tests

Signed-off-by: Jacek Bieniusiewicz <[email protected]>

* Fixed EMA checkpoint tests

Signed-off-by: Jacek Bieniusiewicz <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fixes after review

Signed-off-by: Jacek Bieniusiewicz <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Merged latest main

Signed-off-by: Jacek Bieniusiewicz <[email protected]>

* Removed not used barrier_before and barrier_after params

Signed-off-by: Jacek Bieniusiewicz <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Cosmetic change: removed redundant commas

Signed-off-by: Jacek Bieniusiewicz <[email protected]>

---------

Signed-off-by: Jacek Bieniusiewicz <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Abhishree Thittenamane <[email protected]>
* Only reduce amaxes after fp8 cast for last distopt bucket

Signed-off-by: Tim Moon <[email protected]>

* Handle case with FP8 and contiguous param buffer

Signed-off-by: Tim Moon <[email protected]>

* Support distopt buckets with mixed dtypes

Signed-off-by: Tim Moon <[email protected]>

* Fix bug where fp8 casts were being skipped

Signed-off-by: Tim Moon <[email protected]>

* Debug FP8 params with contiguous param buffer

Signed-off-by: Tim Moon <[email protected]>

* Separate non-FP8 params into leftover distopt bucket

Signed-off-by: Tim Moon <[email protected]>

* Debug FP8 params with contiguous param buffer

Signed-off-by: Tim Moon <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Make sure to update FP8 transpose cache

Signed-off-by: Tim Moon <[email protected]>

* Update Apex commit

Avoid unnecessary FP8 weight transposes.

Signed-off-by: Tim Moon <[email protected]>

---------

Signed-off-by: Tim Moon <[email protected]>
Signed-off-by: Tim Moon <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
* add multitaskAED longform infer

Signed-off-by: stevehuang52 <[email protected]>

* refactor and changed default beam=1 and len_pen=0

Signed-off-by: stevehuang52 <[email protected]>

* revert default len_pen=1.0

Signed-off-by: stevehuang52 <[email protected]>

* refactor

Signed-off-by: stevehuang52 <[email protected]>

* refactor

Signed-off-by: stevehuang52 <[email protected]>

* update doc

Signed-off-by: stevehuang52 <[email protected]>

* fix autocast

Signed-off-by: stevehuang52 <[email protected]>

* fix typo in docstring

Signed-off-by: stevehuang52 <[email protected]>

---------

Signed-off-by: stevehuang52 <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
…_to_text_aed model (#8368) (#8383)

Signed-off-by: Krishna Puvvada <[email protected]>
Co-authored-by: Krishna Puvvada <[email protected]>
Co-authored-by: Krishna Puvvada <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
pre-commit-ci bot and others added 3 commits February 9, 2024 18:07
* Bugfix

Signed-off-by: Jan Baczek <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Undo "fixes" of the bot

Signed-off-by: Jan Baczek <[email protected]>

---------

Signed-off-by: Jan Baczek <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Dmytro Pykhtar <[email protected]>
* Add Finetuning tutorial with HF Datasets



* update on Som comments



---------

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao <[email protected]>
minitu and others added 18 commits February 15, 2024 15:36
* add deallocate pipeline output optimization



* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Jimmy Zhang <[email protected]>
Co-authored-by: JimmyZhang12 <[email protected]>
Co-authored-by: Jimmy Zhang <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: dimapihtar <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
…>1 (#8334) (#8346)

Signed-off-by: Sangkug Lym <[email protected]>
Co-authored-by: Sangkug Lym <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
* Logging changes tested for gpt_pretraining



* Additional args



* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Aishwarya Bhandare <[email protected]>
Co-authored-by: ashbhandare <[email protected]>
Co-authored-by: Aishwarya Bhandare <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <[email protected]>
* Turn on drop last



* Some neva fixes



* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: yaoyu-33 <[email protected]>
Co-authored-by: yaoyu-33 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao <[email protected]>
* init commit - neva tutorial

Signed-off-by: Pratyush Muthukumar <[email protected]>

* NeVA tutorial notebook

Signed-off-by: Pratyush Muthukumar <[email protected]>

* init commit - neva tutorial

Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>

* NeVA tutorial notebook

Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>

* requested changes

Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>

* add inference via script

Signed-off-by: Pratyush Muthukumar <[email protected]>

* requested changes

Signed-off-by: Pratyush Muthukumar <[email protected]>

* requested changes

Signed-off-by: Pratyush Muthukumar <[email protected]>

* add codeblocks to run torchrun in notebook

Signed-off-by: Pratyush Muthukumar <[email protected]>

---------

Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>
Co-authored-by: Pratyush Muthukumar <[email protected]>
* Add `loop_labels` algorithm for TDT greedy decoding

Signed-off-by: Vladimir Bataev <[email protected]>

* Use `loop_labels` by default

Signed-off-by: Vladimir Bataev <[email protected]>

* Loop labels greedy decoding v2

Signed-off-by: Vladimir Bataev <[email protected]>

* Add comments. Clean up

Signed-off-by: Vladimir Bataev <[email protected]>

* Add comments

Signed-off-by: Vladimir Bataev <[email protected]>

* Add comments

Signed-off-by: Vladimir Bataev <[email protected]>

* Add tests for batched hypotheses

Signed-off-by: Vladimir Bataev <[email protected]>

* Add tests for batched alignments

Signed-off-by: Vladimir Bataev <[email protected]>

* Add comments

Signed-off-by: Vladimir Bataev <[email protected]>

* Fix comment

Signed-off-by: Vladimir Bataev <[email protected]>

* Fix test

Signed-off-by: Vladimir Bataev <[email protected]>

* Add computer for TDT

Signed-off-by: Vladimir Bataev <[email protected]>

* Fix TDT decoding algorithm

Signed-off-by: Vladimir Bataev <[email protected]>

* Use loop frames by default for TDT

Signed-off-by: Vladimir Bataev <[email protected]>

* Remove "loop frames" implementation for TDT

Signed-off-by: Vladimir Bataev <[email protected]>

* Clean up

Signed-off-by: Vladimir Bataev <[email protected]>

* Add comments

Signed-off-by: Vladimir Bataev <[email protected]>

* Fix confidence. Use tensor for durations.

Signed-off-by: Vladimir Bataev <[email protected]>

---------

Signed-off-by: Vladimir Bataev <[email protected]>
* Add dist ckpt support for regular optimizers



* [tutorial] fixed missing RIR scripts file. (#8257)



* fix imports



* imports fix



* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* ci imports fix



* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* revert asr notebook



* revert asr notebook



---------

Signed-off-by: Mikołaj Błaż <[email protected]>
Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Co-authored-by: mikolajblaz <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: dimapihtar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
* Rename quick-gelu



* ddpm config guard



* Fix ddpm edit api



* Fix insert_image_token cfg issue



* neva updates



* reformat



* Add back jenkins



* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix jenkins



* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix bugs



* Update default neva template



---------

Signed-off-by: yaoyu-33 <[email protected]>
Co-authored-by: yaoyu-33 <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
* [tutorial] fixed missing RIR scripts file. (#8257)



* add values to en tts dict (#7879)



* mcore ds fix



* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update mcore



* revert asr files



* add comments



* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add support for mcore mock dataset



* update mcore version



* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update gpt cfg



* update mcore commit



* fix Bert unit tests



* update bert tests



* fix bert mcore test



* fix gpt jenkins tests



* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update apex & TE commits



* revert apex installation



* turn off the fusion for jenkins



---------

Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Pablo Garay <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
* Add unique_identifiers for all tokenizers and eod for SentencePieceTokenizer



* Add generalized token aliases to TokenizerSpec to conform with MegatronTokenizer's interface. Remove now-redundant individual fixes from AutoTokenizer and SentencePieceTokenizer.



---------

Signed-off-by: Valerie Sarge <[email protected]>
Co-authored-by: Valerie Sarge <[email protected]>
Co-authored-by: Pablo Garay <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
…hen creating tarred manifests (#8432)

* Improvements for Canary:

- carry over custom keys when creatin tarred manifests
- selectable text field in ASR eval
- get rid of prompt slicing, create proper inference prompts

Signed-off-by: Piotr Żelasko <[email protected]>

* set ensure_ascii=False in tarred conversion to avoid breaking tokenizers trained on UTF-8 encoding

Signed-off-by: Piotr Żelasko <[email protected]>

---------

Signed-off-by: Piotr Żelasko <[email protected]>
* add  sbert to IR

Signed-off-by: ataghibakhsh <[email protected]>

* add doc

Signed-off-by: ataghibakhsh <[email protected]>

* fix the  auto_tokenizer property method reset bug

Signed-off-by: ataghibakhsh <[email protected]>

* addressed bot comments

Signed-off-by: ataghibakhsh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: ataghibakhsh <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
* update

Signed-off-by: eharper <[email protected]>

* udpate

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* landing pages added

* landing page added for vision

* landing pages updated

* some minor changes to the main readme

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* typo fixed

* update

Signed-off-by: eharper <[email protected]>

---------

Signed-off-by: eharper <[email protected]>
Co-authored-by: ntajbakhsh <[email protected]>
Signed-off-by: Alexandros Koumparoulis <[email protected]>
Co-authored-by: akoumpa <[email protected]>
@Victor49152
Copy link
Collaborator Author

jenkins

github-actions bot and others added 7 commits February 20, 2024 16:09
* Fixing mcore bert for TP, PP and SP

* Fixing mcore bert for TP, PP and SP

* Fixing mcore version

* Fixing mcore version

* Update Jenkinsfile



* Update Jenkinsfile



* Update Jenkinsfile



---------

Signed-off-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
* Added LoRA support for the Dense layer of Attention

* Added LoRA MLP support to MCore and NeMo models.

* Change LoRA config default to QKV.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fixed bug with ddp training.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* MCoreMixin chages.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* using new commit of meg-LM

Signed-off-by: arendu <[email protected]>

* add cpu_offloading_num_layers to conversion script until bug in megatron is fixed

Signed-off-by: Chen Cui <[email protected]>

* fix peft mixin arguments to follow mcore 0.5

Signed-off-by: Chen Cui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update megatron commit to fix ci error

Signed-off-by: Chen Cui <[email protected]>

* try to fix ci

Signed-off-by: Chen Cui <[email protected]>

* try to fix ci

Signed-off-by: Chen Cui <[email protected]>

* add cfg default

Signed-off-by: Chen Cui <[email protected]>

---------

Signed-off-by: Adi Renduchintala <[email protected]>
Signed-off-by: Jiaqi Zeng <[email protected]>
Signed-off-by: arendu <[email protected]>
Signed-off-by: Chen Cui <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Adi Renduchintala <[email protected]>
Co-authored-by: Jiaqi Zeng <[email protected]>
Co-authored-by: arendu <[email protected]>
Co-authored-by: HeyyyyyyG <[email protected]>
Co-authored-by: Chen Cui <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
* add/rename from nvgpt to nv_steerlm, add nv_dpo template

Signed-off-by: HuiyingLi <[email protected]>

* add nv_dpo conversation to accomendate empty system message

Signed-off-by: HuiyingLi <[email protected]>

* handle nv_dpo template text generation

Signed-off-by: HuiyingLi <[email protected]>

* add prompt string to nvgpt

Signed-off-by: HuiyingLi <[email protected]>

* bugfix for inference prompt template

Signed-off-by: HuiyingLi <[email protected]>

* bug fix for grabbing clean text

Signed-off-by: Huiying Li <[email protected]>

* fix code format

Signed-off-by: Huiying Li <[email protected]>

---------

Signed-off-by: HuiyingLi <[email protected]>
Signed-off-by: Huiying Li <[email protected]>
…8482)

* Add settings to suppress bf16 compile errors in CI on V100



* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Abhishree <[email protected]>
Co-authored-by: Abhishree Thittenamane <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
* fix chunk infer bug

Signed-off-by: stevehuang52 <[email protected]>

* add support for duration=None, add lhotse support for relative audio path

Signed-off-by: stevehuang52 <[email protected]>

* add tests

Signed-off-by: stevehuang52 <[email protected]>

---------

Signed-off-by: stevehuang52 <[email protected]>
@Victor49152
Copy link
Collaborator Author

jenkins

@Victor49152 Victor49152 changed the base branch from main to r1.23.0 February 23, 2024 00:27
Comment on lines +147 to +150
# if logprobs:
# return logits_list
# else:
# return hypotheses, all_hypotheses

Check notice

Code scanning / CodeQL

Commented-out code Note

This comment appears to contain commented-out code.
Comment on lines +200 to +202
# if logprobs:
# for logit, elen in zip(logits, encoded_len):
# logits_list.append(logit[:elen])

Check notice

Code scanning / CodeQL

Commented-out code Note

This comment appears to contain commented-out code.
Comment on lines +608 to +609
# for idx in range(logits.shape[0]):
# current_hypotheses[idx].y_sequence = logits[idx][: logits_len[idx]]

Check notice

Code scanning / CodeQL

Commented-out code Note

This comment appears to contain commented-out code.
for idx, logit_np in enumerate(transcript_logits_list):
log_prob = logit_np.cpu().numpy()

Check warning

Code scanning / CodeQL

Variable defined multiple times Warning

This assignment to 'log_prob' is unnecessary as it is
redefined
before this value is used.

# Numpy array test
with pytest.raises(NotImplementedError):
outputs = asr_model.transcribe(audio, batch_size=1)

Check notice

Code scanning / CodeQL

Unused local variable Note test

Variable outputs is not used.
raise ValueError(f"Unknown prompt format: {self.asr_model.prompt_format}")
return torch.tensor(tokens, dtype=torch.long, device=self.asr_model.device).unsqueeze(0) # [1, T]

def read_audio_file(self, audio_filepath: str, delay, model_stride_in_secs, meta_data):

Check warning

Code scanning / CodeQL

Signature mismatch in overriding method Warning

Overriding method 'read_audio_file' has signature mismatch with
overridden method
.
@@ -12,6 +12,7 @@
# See the License for the specific language governing permissions and
# limitations under the License.

from collections import OrderedDict

Check notice

Code scanning / CodeQL

Unused import Note

Import of 'OrderedDict' is not used.
# See the License for the specific language governing permissions and
# limitations under the License.

import copy

Check notice

Code scanning / CodeQL

Unused import Note

Import of 'copy' is not used.
padding_mask_batch.append(padding_mask)

if not self.cfg.bert_binary_head:
types = None

Check notice

Code scanning / CodeQL

Unused local variable Note

Variable types is not used.

return fwd_output_and_loss_func

def loss_func(self, output_tensor):

Check warning

Code scanning / CodeQL

Signature mismatch in overriding method Warning

Overriding method 'loss_func' has signature mismatch with
overridden method
.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.